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^ ! 1 Introduction 



In this paper, we show the existence of a deterministic stationary optimal policy out of the class 
of randomized history-dependent policies for (unconstrained) discounted continuous-time Markov 
decision processes (CTMDPs) with unbounded rates and with Borel state and action spaces. CT- 
MDPs have been studied intensively since 1960s, and their formal constructions are available in [TJ 
for deterministic stationary policies, in [2J5] for deterministic Markov policies, and in |13) for ran- 
domized Markov policies. The first rigorous construction allowing deterministic history-dependent 
policies is in [26] [28] , where the author viewed CTMDPs under deterministic history-dependent 
policies as special semi-Markov decision processes (SMDPs) whose actions are taken from spaces of 
measurable mappings. The first successful construction of CTMDPs allowing randomized history- 
dependent policies is in (TS], which is based on [16] . As noted in although the construction 
in |26l 128) is restricted to deterministic history-dependent policies, it can be modified to allow 
randomized history-dependent policies. In this connection, Yushkevich's construction is indeed 
equivalent to Kitaev's construction. To our best knowledge, currently, Kitaev's construction pro- 
vides the standard setup for CTMDPs allowing randomized history-dependent policies, which we 
base the present work on. A brief reminder of this construction is provided below. 

The expected total discounted cost has been a common optimality criterion for CTMDPs 
optimization problem^], and the existence of an optimal policy for discounted CTMDPs has been 
studied by numerous authors, see for example, (3J [T7J [37] . In greater detail, [IT] is restricted to 
deterministic Markov policies, [37] considers deterministic history-dependent policies, while [U |2"2"] 
allow randomized history-dependent policies into consideration. It should be emphasized that 
all of them assume uniformly bounded transition rates. On the contrary, [3] [5] study discounted 
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CTMDPs allowing transition rates to be not uniformly bounded. However, the conditions assumed 
therein are difficult for verifications, as some of them are not directly imposed on the primitives but 
on the transition probability functions. Later on, there have been developments in the direction 
of only imposing conditions on the primitives, while still allowing unbounded transition rates, 
see [5J [55] and the relevant chapters in the monograph 9 J. It should be noted that all of the 
aforementioned works allowing unbounded transition rates are restricted to the class of randomized 
Markov policies. As a fact of matter, according to [7], the study of CTMDPs with the combination 
of randomized history-dependent policies and unbounded transition rates had been an over thirty 
year-old open problem. To our best knowledge, the first successful treatment for such CTMDPs is 
given by [TU], where the state space is countable. 

In the present paper, we consider a more general case by allowing randomized history-dependent 
policies, unbounded transition rates and Borel state and action spaces into consideration, while all 
our conditions are imposed on the primitives. The cost rates being allowed to be unbouned (both 
from below and above) are more general than those considered in [4j [5l [6j [TJ [8l [9j [10] and many 
others, too. 

The main contributions of the present paper are triple-folded. Under the imposed conditions on 
the primitives, we firstly show the regularity of the controlled process under any given randomized 
history-dependent policy, which allows a formal optimization problem statement. Then we develop 
the dynamic programming approach, by showing that the optimal value of the problem satisfies the 
corresponding Bellman equation. Finally, we establish the existence of a deterministic stationary 
optimal policy. In relation to the most recent literature on this topic, the present work refines [5] 
by considering randomized history-dependent policiefl and extends [TU] to the case of Borel state 
spaces and more general cost rates. 

The rest of this paper is organized as follows. In Section [3J we briefly describe Kitaev's con- 
struction for CTMDPs, and present some preliminary results including the regularity, Kolmogorov's 
forward equations and Dynkin's formula for the controlled processes, which could be not Markov. 
In Section [3] we present the main statements. Section @] contains a new example. We finish this 
paper with a conclusion in Section [S] Several statements presented in this paper appeared without 
proofs in [2"5] . 

2 Preliminaries 

The following denotations are frequently used throughout this paper. I stands for the indicator 
function. 8 X (-) is the Dirac measure concentrated at x. B(X) is the Borel cr-algebra of the Borel 

space X. T\ \J F% is the smallest cr-algebra containing the two cr-algebras T\ and Ti- M+ = (0, oo). 

= [0,oo). Za_ = N1J{0}. The abbreviation s.t. (resp. a.s.) stands for "subject to" (resp. 
"almost surely" ) . 

2.1 Kitaev's construction 

The materials presented in this subsection are mainly from [T5J [TH] [55] ■ 
The primitives of discounted CTMDPs are the following elements: 

• state space: (S,B(S)) (arbitrary Borel), 

• action space: (A,B(A)) (arbitrary Borel), 

• admissible action space A(x) £ B(A) and the space of admissible action-state pairs K = 
{{x,a) € S x A : a E A(x)} G B(S x A), assumed to contain the graph of a measurable 
function <f> from S to A such that V x £ S, 4>(x) € A(x), 

2 In comparison, [8] only considers a specific class of Markov policies, under which the resulting (nonhomogeneous) 
transition rates are required to be continuous in time, merely for the sake of validating the relevant results from [3]. 
In our opinion, this continuity is not needed. 
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• transition rate: q(dy\x, a), a signed kernel on B(S) given (x,a) G K, taking nonnegative 
values on Ts \ {x} with Ts G B(S), being conservative in the sense of q(S\x,a) = and 

stable in that q x — sup aeA ^ q x {a) < oo, where q x (a) = — q({x}\x, a), 

• cost rate: co(x,a) measurable in (x,a) G K, 

• discount factor: a > 0, 

• initial distribution: 7(-), a probability measure on (S, B(S)). 

Incidentally, we remind that a singleton {a;} C S is measurable, and q x (a) is measurable on K, see 
[TJ Prop 7.29]. In what follows, for the sake of formality, if needed, V Ts G B(S), we may consider 
q(Tg\x, a) as its measurable extension on S x A, where q(Ts\x, a) = on (S x A) \ K, and similar 
assertions are applicable to other functions such as Co, and so on. This is just the convention, see 
QU Chap.6]. 

Given the above primitives, let us recall the construction of the underlying stochastic basis 
(17, F, {Jt}t>o, Pj) and the controlled process {£*, i > 0} thereon, as given in [TB] (see also [H?ll2"2"] 
for more details). This is done in four steps. 

Step 1: measurable space (12, J 7 ). Having firstly defined the measurable space of (17°, J 70 ) = 
((S X K+)°°, B((S x K + )°°)), let us adjoin all the sequences of the form 

(xo, Ox, X\, • • - , Gjn — 1; X m — ij OO, Xqq , OO, Xqq, . . . ) 

to 17°, where x\ G S, Xoo ^ S is an isolated point, m > 1 is some integer, 6i G R+ and xi ^ Xoo for 
all nonnegative integers I < m — 1. After the corresponding modification of the er-algebra J 70 , we 
obtain the basic measurable space (17, J 7 ). 

Step 2: stochastic process > 0} and history {J r t}t>o- Putting T = 0, T! m = Q\ + 62 + 

• • • + dm, Too = lim m -!.oc Troi we can define the process of interest: 

£t M = ^ I{T m < t < T m+1 }x m + /{T^ < t}xoo 

»TI>0 

together with the history it is adapted to: 

T t = u{{T m < s, x m G T s } : T s G B(S), s<t,m> 0). 

In what follows, as usual, ui = {xq, 61, Xi, . . . } is often omitted, and h m (uj) — (xq, 6%, . . . , # m , a; m ) is 
referred to as an m-component history. Here. 0„, (rcsp. T,„, x m ) can be understood as the sojourn 
times (resp. the jump moments, the state of the process on the interval [T TO , T TO+ i)). We do not 
intend to consider the process after Too : the isolated point Xoo will be regarded as absorbing. 

Step 3: policy 7r. Having adjoint the isolated point to A, we thus define Aoo = A[J{aoo} 7 

and put A(xoo) = {doo}. Similarly, Soo = S{J{soo}- Denoting T s - = Vt<s^~t' the predictable 

(with respect to {Jt}*>o) c- algebra on flxtl is given b y p = a(rx{o}(rG Jo),r x (s, 00) (r g 
J- s -)). See [HJ Chap. 4] for more details. Now the following definitions are in position: 

• Randomized history-dependent policy: 7r(-|w,i), a "P-measurable transition probability func- 
tion on (A 00 ,B(A 00 )), concentrated on A(£ t _). Below, U is the set of all such policies. 

• Randomized Markov policy: 7r(-|o;,t) = 7r m (-|£t_(cj),i). Here concerning the RHS, Tr m (-\x,t) 
is B(Soo x R") -measurable. 

• Randomized stationary policy: n(-\u),t) = 7r s (-|£ t _(w)). Here concerning the RHS, 7t s (-|.t) is 
B ( Soo ) -measurable . 

• Deterministic stationary policy: n(-\io,t) = I{- 3 0(£ t _(w))}, where <fi : Soo ~> Aoo is a 
measurable mapping. Such policies are denoted as </>. 
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Remark 1 The term "randomized policies " is adopted from JH \18l \22f . However, under a ran- 
domized policy, it does not mean that decisions are made randomly continuously in time, which is 
not always possible (see Sec. 7]). In fact, the term of randomized policies should be understood 
as relaxed control policies, as remarked in 1191 Chap. 4]. Throughout this paper, the most general 
policy under consideration is randomized history-dependent. 

Step 4: (7, 7r-dependent) probability measure P" on (ft, J-). Under any fixed policy n, let us 
define 



v*(w,r s x dt) = A(T s \uj,t)dt = 



ir(da\u,t)q(Fs\{Zt-}\€t-,a) 



A 



dt, (1) 



where T$ S B(S), and the obvious dependence of A on 7r has been omitted. This random measure 
is predictable, see [THJ HH [52] . According to [HJ Chap. 4] (see also [IB]), the "jump intensity" A 
has the following form: 

A(dy\u,t) = J i T ™ < 1 ^ T m+1 }A m (dy\x ,e 1 ,...,x m ,t-T m ) 

m>0 

+I{t = 0}A°(dy\x ), (2) 

where V T$ G B(S), A m (rs|a;o, 61, ... , x m , u) are some nonnegative, non-random measurable func- 
tions. Then comparing (p} with @, we have the explicit formulcH for A™ : 

A m (dy\xQ,9i,...,x m ,u) = / ir(da\x , 61, . . . , x m , u + T m )q(dy \ {x m }\x m , a). (3) 

J A 

Let H = S and H m = S x ((0, 00] x Soo)" 1 , m — 1,2, The marginal of P5 on H coincides 

with 70 Suppose that P* on H m for 1 < m < k has been constructed. Now it is only needed to 
construct P^ on Hk+i- But this can be done via 

J^(r d * x (du x dy)) = [ P*(dh k )I{6 k < oo}A k (dy\h k ,u)e-fo^(s\h k ,v)dv du ^ 

Jr H k 

P*(T 6 * x (oo, Xoo )) = j^H P * {dhk) { 1{6k = °°^ + I ^ k < oo}e-SS°A- h {S\h k ,v^ ; (4) 

where T Hk G B(Hk). It remains to apply the induction and Ionescu-Tulcea's theorem 1] p. 140- 
141, Prop. 7. 28] to induce that is the unique probability measure on (ft, J 7 ) such that its 
projection (marginal) onto H m satisfies ((4]), m = 0,1,.... This gives rise to stochastic basis 
(ft, T , {Ft}t>o,P^), which is always assumed to be complete. 
In fact, according to [TB], if we define the random measure 



(i(oj, dt, dy) = Y I{ T m < oo}7{a; m e dy}I{T. m e dt}, (5) 



A 

m>l 



then under any fixed policy 7r and initial distribution 7, the above defined measure P" on (ft, T) 
is such that its projection on the 0-component history is 7, and v* defined by ([T]) is the dual 
predictable projection of \i defined by ([5]). See [THl Chap. 4] for more details. 

Below, when j(-) is a Dirac measure concentrated at x £ S, we use the "degenerated" denotation 
PJ. Expectations with respect to P" and P* are denoted as and E£, respectively. 



3 In fact, since n(-\ui,t) is P-measurable, it also admits a similar representation to A(-\u},t) (see lO)- This is 
because of 1191 Chap. 4], In this connectation, to be absolutely rigorous, one should write 7r m (-|xo, Si, ■ ■ • , x m , u) in 
J5), rather than tt(-\xq, Si, ... , x m ,u + T m ). Nevertheless, here and below, we omit that superscript m, and use the 
denotation n(-\xo, Si, . . . , x m , u + T m ) for ir rn (-\xo, Si, ... , x m , u). This is merely for brevity, as the context always 
excludes any confusion; besides, the superscript m has already been used to indicate a Markov policy. 

4 Below, with some abuse of denotation, we also use for the marginals on H m - 
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2.2 Properties of the controlled process and optimization problem state- 
ment 

Condition 1 There exist a measurable (weight) function w(x) > 1 on S and constants p > 0, b > 
such that 

(a) Si = S and limj-^oo mf x& s\Si w(x) = oo for an increasing system of measurable subsets 

(b) J s q(dy\x, a)w(y) < pw(x) +6,Vi6S,ae A(x). 

(c) For any I E sup xeSl q x < oo, where Si has been defined in part (a), and q x = SUp oe ^( x ) Qx( a )- 

Remark 2 Below, we assume p > 0, where p is defined in Condition]]] This can be done without 
loss of generality, because the case of p = can always be considered by passing to the limit as 
p — > 0, with p > 0. We emphasize that if Condition]]] is satsified by p = 0, it is also satisfied by 
any arbitrarily fixed p > 0. 

Condition [T] is of a Lyapunov type. Theorem [T] shows that it guarantees the £t process to be 
non-explosive. 

Condition 2 (a) J s ^(dy)w(y) < oo, where 7 is the given initial distribution. 

(b) a > p, where a is the discount factor, and p is as in Condition]^ 

(c) There exist constants M > and c > such that \ inf a ^A(x) co(x,a)\ < Mw{x) + c,V x E S. 

This condition guarantees that the performance functional ^ is well defined. Condition [2Jc) is 
a version of the one imposed in j^T] , where the author studies CTMDPs with bounded transition 
rates and average criteria. 

Theorem 1 Suppose Conditional] is satisfied. Then under any policy tt E U, the following asser- 
tions hold: 

(a) For any given initial distribution 7, P^ r (T 00 = 00) = 1, and hence V t > 0, P^{£,t E S) = 1. So 
explosion does not occur. Moreover, for all x E S,t > 0, 

El[w^ t )]<e" t w{x) + -{eP t -l). 

P 

(b) If additionally Condition^ is satisfied, then for any 7, inequality 

M(aJ s -y(dy)w(y)+b) c 

V {ir) > 2— > -00 

a [a — p) a 



holds, where 



V a (n)=EZ 



e l c (t;t-,a)ir(da\uj,t)dt 



(6) 



We use denotation Vo(x, 7r) if the initial distribution 7 is concentrated at state x £ S. 
The proofs of this theorem and the other main statements presented in this paper can be found 
in the appendix. 

Theorem Q] implies that the following CTMDPs optimization problem under consideration is 
well defined: 

Vo(n) ->■ min. (7) 

Definition 1 Denote by V Q * = inf^f/ Vq(tz) the optimal value of CTMDP A policy tt* is 

called optimal, if Vo(tt*) — V *. CTMDP ^ is called solvable, if such a n* exists. 

Remark 3 Equality (0) holds P^-a.s., as well as all the subsequent equalities and inequalities 
involving uj. The values of integrals like do not change, if we replace £{_ with £ t . 
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2.3 Auxiliary results 

Generally speaking, q x may be not measurable. However, according to jTTJ D.5 Prop.] (see also [TJ 
Prop. 7. 33]), q x is measurable on S if the following condition is satisfied. 

Condition 3 (a) A(x) is compact, VigS. 

(b) q x (a) is upper semicontinuous on A(x), VieS. 

Kolmogorov's forward equation (in the integral form) and Dynkin's formula are rather useful 
tools for studying CTMDPs. In case it is Markov, they are well known. For a randomized history- 
dependent policy 7r, under the imposed conditions, it turns out that they still hold. 

Condition 4 There exists a constant L > such that < q x < Lw(x), V x G S. 

We need this condition to be sure that the last term in formula ((9]) is finite. 

Theorem 2 (a) Suppose Conditional is satisfied. Then under any fixed policy ir, V x G S, t G 

V T 6 B(S) such that 3 I : T C Si, with Si being defined in Condition]^ Kolmogorov's forward 

equation (in the integral form) holds: 



e r) 



i{x er} + E: 



n(da\oj,u)q(T \ {£ u }\£u,a)du 



-El 



J A 

n(da\oj,u)q(: u (a)I{£ u G T}du 



(8) 



(b) In part (a), if we replace Condition^ c) by Condition^ whereas all the other parts of Condition 
[7] are still satisfied, then we have the following stronger statement: Vf6 B(S), 



PZfo e r) 



i{x er} + E: 



n(da\oj,u)q(T \ {£ u }\£u,a)du 



-El 



o J A 



o J A 

n(da\oj,u)q^ u (a)I{^ u G T}du 



(9) 



The expectations that appear in the above formulae are finite. 

For the case of uniformly bounded q x , Kolmogorov's forward equation (J9j) has been established 
in [THl Lem.4]. Throughout this paper, Condition 2] is only required for proving Theorem [5Jb), 
while Theorem [5Jb) itself is never used elsewhere in this paper. However, it is needed in |24j . 

We need parts (a,b) of the next condition for establishing Dynkin's formula, where the product 
q^u((; v ) must be integrable for u G fS W '(S). (See Definition [2]) 

Condition 5 There exist a measurable function w'(x) > 1 on S and nonnegative constants L' , p' 

and b' such that the following assertions hold: 

(o-) (Qx + l)to'(x) < L'w(x), where w comes from Condition^ 

(b) J s q(dy\x,a)w'(y) < p'w'(x) + b' ,V x G S,a G A(x). 

(c) a > p'. 

(d) There exist constants M' > and c' > satisfying \ hxf a ^A(x) co(x,a)\ < M'w'(x) +c',V x G S. 

Condition [S]Jc,d) guarantees that the corresponding performance functional is well defined (cf 
Condition [2jb,c) ). Under Condition [1] and Condition [5ja), E x [w'(^t)] < oo due to Theorem [lja). 



Definition 2 A measurable function u on S satisfying sup^gg 



w(x) 



< oo (resp. sup^gg 



w' (x) 



' 1 ■ ■' ) 

oo) is said to have a bounded w-(resp. w' -) weighted norm, with the norm \\u\\ w — s\xp x< zs ^(J) 

= sup a gg ^(l) )- The collection of all functions u on S with a bounded w-(resp. 
w 1 -) weighted norm is denoted by B W {S) (resp. B W >(S)). 



(resp. 
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Theorem 3 Suppose Condition [7] and Condition ®(a, b ) are satisfied, 
following two versions of Dynkin's formula hold: 



Then V u € B w i(S), the 



E^[um-u{x)=El 



7r(da|w, v)q(dy\£, v ,a)u(y)dv 



S J A 



(10) 



u(x) = El 



S J A 



n{da\ui,v)q(dy\£, v ,a)u(y) > dv 



(11) 



3 Main statements 



Condition 6 (a) For any bounded nonnegative measurable function u{y) on S and fixed x G S, 
u'(x,a) = J s u(y)q(dy\x 7 a) is lower semicontinuous in a G A(x). 

(b) f s w(y)q(dy\x, a) is continuous in a € A(x), V x G S, where w comes from Condition^ 

(c) Co(x,a) is lower semicontinuous in a G A(x),V x G S. 

(d) A{x) is compact, VigS. 

Remark 4 By reasoning similarly to \VA P-44]> one can show that Condition\B(a) is equivalent to 
the following: for any ieS and bounded measurable function u(y) on S, function J s u(y)q(dy\x, a) 
is continuous in a £ A(x). Therefore, Condition®(a) is stronger than Condition\9(b) . 

The next statement is similar to Theorem 3.3 (b) in [8]. 

Theorem 4 Suppose Condition\^b), Condition W(b,c) and Condition® are satisfied. Then the 
Bellman equation 

(12) 



au(x) = inf <co(x,a) + / q(dy\x, a)u(y) 
aeA(x) { J s 



admits a solution u* G B W (S), which is given by the point-wise limit of the following non-increasing 
sequence of measurable functions {u^ n \ n = 0, 1, . . . }: 



u^{x) 



a M(aw(x) + b) 



,(«+!) 



A 



(x) — inf 



a(a — p) a ' 

c (x,a) 



1 



For each n = 0, 1, 2, 



aeA(x) ^ a + 1 + q x 



u {n \x)\ < 



1 



q x Js 

M(aw{x) + b) 
a(a — p) 



u (n) (y) f l(dy\x,a) 



1 + & 



I{x edy})\. (13) 



c 

Q 



Remark 5 (a) Suppose Condition®(b,c,d) is satisfied. If additionally Condition® (with w being 
replaced with w' in its part (b)) is satisfied, then the statements of Theorem [^] are still valid, with 
w,M,c,p and b being replaced by w',M',c',p' and b' everywhere. This remark can be verified by 
repeating the reasonings used in the proof of Theorem^ with obvious modifications. 

(b) Condition\^(b) , Conditional a) and Condition® altogether imply that J s w'(y)q(dy\x,a) is 
continuous in a G A(x) for each x G S (see \1'A Lem. 8.3.7.]). 

Theorem 5 Suppose Condition\]\ Condition\^(a,b), Condition® and Condition® are satisfied. 
Then the following assertions hold: 

(a) Suppose function u* G B w i(S) solves the Bellman equation US\) . then, for some deterministic 
stationary policy <p* 

r j(dy)u*{y) = inf V (ir) = V {</>*). 



If a measurable map 4>* : x — > 4>*(x) G A(x) provides the infimum in then policy <jf is optimal, 
(b) The Bellman equation US\) has a unique solution u* in the class B w i(S) which can be con- 
structed using iterations hlS]) . where w,M,c,p and b should be replaced with w' , M' , c' , p' and b' . 
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(c) The Bellman junction u* solves the following dual linear program (DLP) in the space of mea- 
surable functions on S: 

/ l(dy)v(y) -> max (14) 
Js 

s.t. 

1 If 

—co(x,a)—v(x)-\ / v(y)q(dy\x, a) > 0, V (x, a) G K; 

a a Js 

veB w ,(S). 



(d) Suppose v is feasible for DLP Then it solves the DLP if and only if v{x) = u*{x) a.s. 

(with respect to j). 



4 Example 

Consider a one-channel queuing system without any space for waiting: any job that finds the 
server busy is rejected. We characterize every job by its volume x G (0, 1], so that the state space 
is S — [0, 1]: £t = means the system is idle; £ t = x G (0, 1] means the corresponding job is under 
service. We put A = [0, oo), and action a G A represents the service intensity. Let A(0) = and 



A(x) 



0, A 



, where A > is a constant. The jobs arrive according to a Poisson process with a 

fixed rate A > 0, and the volume is distributed according to density 5x , x G (0, 1] independently 
of anything else. Therefore, 

q(T\0, a) = 5A / y 4 dy - XI{T 3 0}, V T 6 B([0, 1]). 
Jr\{o} 

For any fixed x G (0, 1], a G A(x), the service time of a job of volume x is exponentially distributed 
with parameter -, so that 

gfTlar.a) =/{0ef,i^ T}- -I{0£T,xe r}-,V T G B([0,ll). 

x x 

We assume that when a served job leaves the system, it gives an income of one unit; the holding 
cost of a job of volume x G (0, 1] equals C\x per time unit; and the service intensity a G A is 
associated with the cost rate C^a 2 : . Here C\ > and C% > are two constants. Thus 

co(x,a) — C\x + C20? ,V x G (0, ll,a G A(x), 

x 

and co(0, 0) = 0. We emphasize that as can be easily verified, q x is unbouned, and cq(x, a) is 
unbouned (from both above and below) when A > 
Finally, let a, the discount factor, be big enough: 

a > 4A, 

and let 7, the initial distribution, be such that 

f 1 1 

/ l(dy)— < 00. 

Jo y 

Theorem 6 (a) For the model described, all the conditions formulated in this paper are satisfied, 
(b) Suppose C\ > is small enough (or a is big) in that j 1 - < 1, and define 



u{x,z) = -2aC 2 x 2 - z + 2J a 2 C%x 4 + dC 2 x 3 + aC 2 x 2 z,\/ x G (0,1], z G [0,oo). (15) 
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Then the following recursion relations 
zW = 0; 

w (n) (x) = u(x, z (n) ) = -2aC 2 x 2 - + 2 s Ja 2 C$x i + dC 2 x 3 + aC 2 i 2 zW, xe(0,l]; 

z {n) = / u^(y)y i dy, n = 0,1,2,... 

converge: the sequence {z( n \ n = 0, 1, . . . } is increasing and has a finite limit z* = lim„_ ) . 00 z*-™', 
and lim„_,. ^ = z*) = u*(x), V .t e (0, 1]. 

(c) Suppose w 1 - < 1, and constant A is big enough in that the limiting function u*(x) satisfies 

inequality - ^ch* — x ^ (0> •"■]■ ^ erl w*(a;), supplemented at zero by the value u*(0) = 1 — z*, 
solves the Bellman equation \l c 2\) , and the deterministic stationary policy 

r(x) = u*(x) + g Vlg(01] w ^ (Q) = (16) 

is optimal. 

Remark 6 ('aj If parameter A increases, the solution to this example does not change. We cannot 
put A(x) = [0, 00) because in this case the transition rate becomes unstable: sup al£A ^ q x (a) = +00. 

(b) It follows from the proof of Theorem^ that z* < ^C^a + and function u(x,z) defined 
by hi 5}) decreases with z for any fixed x G (0,1]. These observations allow us to estimate the 
admissible values of A. 

(c) In case C\ is very big (see part (c) of Theorem^ then it can happen that action a* = becomes 
optimal for small values of £ t = x. Indeed, if a > then there can be transitions x — > — > y — > . . . 
with a good chance to have a big value of y leading to a big holding cost in the future. Thus, in 
this situation it can be reasonable to select a* — and finish with the cost rate , which is small 
if x is small. 

5 Conclusion 

As mentioned in [TS], the standard results for (unconstrained) discounted CTMDPs include that 
the model is well defined, the Bellman equation is satisfied, and there exists a deterministic sta- 
tionary optimal policy. In the present work, taking into account as general as randomized history- 
dependent policies, we obtain all such standard results for CTMDPs in Borel spaces. The conditions 
we base our study on are imposed on the primitives, allowing unbounded transition and cost rates. 
In particular, our conditions imposed on the cost rate are more general than those in all the papers 
on discounted CTMDPs in the references. In this connection, the present paper is arguably in 
quite a general setup. 

We emphasize that our conditions are sufficient but not necessary for studying discounted 
CTMDPs. For instance, we believe that the conditions imposed in [3S], which are different from 
the conditions imposed here and still allow unbounded transition rates and cost rates, could be 
also sufficient for us to obtain the standard results as presented in this paper. On the other hand, 
there exists research on CTMDPs (see [15 ]), whose study is only based on necessary conditions, 
which just requires that the underlying models are well defined (no explosion happens), and so are 
the expected total discounted costs (can be positive or negative infinity). In such a general setup, 
the authors of [13] obtain some nonstandard results for discounted CTMDPs in countable state 
and action spaces. 

Appendix 

In this appendix, we establish some lemmas, and prove the main statements. 
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Lemma 1 Let a signed kernel f(dy\x,t) on B(S) given (x,t) G S x M. ^ be fixed, and assume that 
it satisfies that following: f(Ts\x,t) > if T$ G B(S) and x £ T$, f(S\ {x}\x,t) < oo, and 

/(5|x,i) = 0. Here, we put F x (t) = f(S \ {x}\x,t) < oo. Suppose there exist constants p ^ 0, 
b > and a measurable function w(x) > on S such that J s f(dy\x,t)w(y) < pw(x) + &, V x G S. 
Then 



h(s,x,t)> / e-^ F ^ v)dv f(dy\x,u)h(u,y,t)du + e-fs F *^ dv w (x), 

Js JS\{x} 

where h is a nonnegative function defined by 

h(s, x, t) = e p{t - s) w(x) + -(e p{t - s) -l),V0<s<(,xeS. (17) 

P 

Proof: Straightforward calculations result in 

e -fs F °>^ dv [ f{dy\x,u)h{u,y,t)\du + e-^ F * {v)dv w{x) 

Js\{x} J 

f:F x (v)dv e p(t-u) f^J f(dy\x,u)w(y) ~ f({x}\x,u)w(x)j du 

e - r; FAv)dv e p{t-u) Fx ^ du 



t 

e~ 



b '* 



P . 

< /" e-Ss F * {v)dv e p{t - u) {pw{x)+b + F x {u)w{x))du 
+ - / e-f" F * (v)dv e p{t - u) F x (u)du 

P Js 

ft 

e~ J" F ^ dv F x (u)du + e~ /• ^"^(a;). 



The rest of this proof now becomes identical to the one of O Lem.3.2(a), p. 239]. □ 

Corollary 1 Suppose Conditional)) is satisfied. If p coming from Condition]7\is strictly positive, 
then 

h(s,x,t) = h(0,x,t — s) 

~ Is { E f: Al(Slxo ' ei '-' " x ' v)dv J ^ l (dy\x , 9x,...,9 u x> u)h(u, y, t)\ du 

+e - It A'(s\xo.e 1 ,...A,x,v)dv w ( x ^y xe s,0< s <t<oo,leZ° + , (18) 
where h is given in JiTj ). 

Proof: Let I G Z9_ be arbitrarily fixed. Consider the signed kernel on B(S) given (x, u) G S x R+, 
defined by V T$ G B(S), 



gi(T s \x,u) = 



A l (T s \x ,0 1 ,...,6i,x,u) if x <£ T s ; 

-A l (s\x ,e 1 ,...,e l ,x,u) ifr s = {4, 



where A' is defined in ©. It can be easily verified that all the conditions in Lemma [T] are satisfied 
by b > 0,p > 0, w(x) > 1 (coming from Condition [T]) and this signed kernel gi(-\x,u). Now the 
statement follows from Lemma [1] □ 
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Lemma 2 Suppose Condition\^(b) is satisfied. Then under any policy tt, V x G S, m = 0, 1, 2, . . . , 

E£[w(tt)I{t < T m+1 }} < (e pt w(x) + -(e pt - l))I{p > 0} + (w{x) + bt)I{p = 0}. 

P 

Here, constants b, p and function w come from Condition [jfl 

Proof: Suppose p > 0. As for the statement, we prove the following slightly stronger result, i.e., 
V to e Z 1 ^, x £ S, n — 0, 1, . . . , to, 

< T m+1 }|J- Tm _„] < /{T m _„<t}/i(T m _ n ,x m _ n ,t) 

m— n 

+ ^ J{T fc _i < t < T k }w(x k -x), 

k=l 

where TT m _ n = o~(xi, T{ : i £ JP + , < i < m — n). 
This stronger statement is proved inductively. 
Consider n = 0. On the set {T m < t}, equation (0]) implies 

p:(e m+1 >t-T m \T Tm ) = e -j?- a ^A-(si& m> «)«to < (19) 

By the properties of conditional expectations and (11911 , we have 

El [u>(&)I{t < T m+1 }| J- Tm ] = [(/{T m <t} + I{T m > t})w(&)I{t < T m+1 }\F T J 

= I{T m < t}w(x m )PZ(9 m+1 > t - T m \T Tm ) 

m 

+ Y / I{Tk-i <t<T k }w(x k . 1 ) 

k=l 

= I{T m < t}w(x m )e- fo- Tm A m (s\h m ,v)dv 

m 

+ 53/{T fc _ 1 < t < T fc }«;(a;fc_i) 

k=l 

m 

< I{T m < t}h(T m ,x m ,t) + ^I{T k - 1 < t < T k }w(x k -i), 

fc=i 

where the last inequality follows from (fT8|) . 

Now suppose the stronger statement holds, V < n < m. 

°In this lemma, we temporarily ignore Remark [2] 

6 Throughout this proof, this result is referred to as the "stronger statement". 
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Consider the case of n + 1. By the properties of conditional expectations, the inductive suppo- 
sition and ([!<?]) ■ we have 

El [wfa)I{t < IWIJt^ J = El [El [w(Z t )I{t < T m+1 }\ F Tm _ n ] \T Tm ^} 



< El I{T m _ n <t}h(T m 

k=l 

= El [J{T m _„_i < t}I{T m - n < t}h(T m _ n , x m _„, t)\J r r m _ n _ 1 ] 
+EZ 



-El 



I{T m -n-l <t}J2 ^ ' < ^^(^-l)!^^ 

fc=l 
m — n 

I{T m -n-i >t}Y, HTk-i < t < T fc }«;(a: fc _i)|J : rm _ n _ 1 



— I{T m -n-l < t} 



fc=l 
t-T m _„ 



■ / " A m -"- 1 (S|?i™- n ^i,i')di' 



5\{ 3;m _„_ 1 } 



w)/i(T m _„_i + u, y, t) > du 



+ e Jo 



f m_ " _ A m ~ n ~ {S\h 



)>+ I{T k -i < t < T k }w(x k -x) 

) k=l 

m—n—1 

< I{T m - n -l < t}h(T m - n -i, X 

m — n — 1-) 

k=l 

where the last inequality follows from (fig)) . 

Hence, the stronger statement holds. It remains to put n — m in the stronger statement to 
obtain Lemma [5] for the case of p > 0. 

The statement corresponding to the case of p = follows from the fact of \\mp^{e^ t 'w{x) + 
k(eP l — 1)} = w{x) + bt. Here, we emphasize that if Condition Q] is satisfied by p — 0, it is also 
satisfied by any arbitrarily fixed p > 0. □ 



Lemma 3 Suppose Conditionals satisfied. For any fixed I G Z5_, consider the modified transition 
rates defined by 

a J tj((-|x,a), ifxe Si; 
0, ifxeS\S t 



qi(-\x,a) 



Their corresponding probabilities and expectations are denoted by P£' 1 and El' . Then under any 
policy 7r, V x G S 1 , t > 0, 

limP*''(£ t eS\S z ) = 0, (20) 

/— )-oo 

where Si is defined in Condition{J](a). 

Proof: Throughout this proof, let x G 5 and t > be arbitrarily fixed. Under Condition [TJ we 
have that V e > 0, 3 J(e) > : V I > J(e), 

e^wlx) + 4(e^ - 1) 
inf w(y) > p - , (21) 

y£S\S t € 

where p = p + 1. 

Suppose the statement of this lemma does not hold, i.e., 3e>0:VL>0,3/> max{L, J(e)} : 

PZ' l (£t eS\S<)>e. (22) 
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At the same time, necessarily, (|21l) holds as well. On the one hand, by using Lemma and the 
fact of sup x6 g sup aej 4( x ) q x (a) < sup^g^ q x < oo (see Condition [1| , we have 



lim E^ l {w^ t )I{t < T m+1 }} 

m—^oo 



< eP t w(x) + -(e^ -1). 



(23) 



On the other hand, we have 



E^ 1 [«,(&)] - G 5 \ Si] £S\Si)+ El* [wfo)\t t G S,] P^& G 50 

> inf w(y)e> e^wix) + -{e pt -1), 
yes\s, p 

where the first inequality follows from ignoring the second term in the first line and estimating 
the first term from below using (|2"2"|) . and the last inequality is a result of (|2"Tj) . However, this 
contradicts (f2"3"]l . □ 

Proof of Theorem [TJ (a) From Q, we clearly have that V I G t > 0, 

Px fte — x oo) [J((£t ^oo) p] (the process visits 5 \ Si at least once on [0,t]))J 

= i - p;(v i g [o, t],& g Si) = i - p;^(6 g s t ) 

= P^ l ^t=x 00 )\J^ t eS\S l ))=P^^ t eS\S l ). (24) 

Here, we have repeatedly used the fact of sup xGS swp aGA ^ q x (a) < swp xeSl q x < oo, so that 
Px {Too = oo) = 1. By using Lemma[31 (|24p and the fact that (S \ Si)zez 1S a decreasing system, 
we have V t > 0, 



PI N l£Z° + , (& = Xoo) (J((£ t ^ Xoo) P| (the process visits 5 \ Si at least once on [0,i]))J =0, 
which is equivalent to 

p; (3leZ° + , & ^ zoo) p|(& = *°°) U v 1 e [°> *]> 6 G = *> 



i.e., for each t > 0, PJ(3 I G Z^,V f G [0,t],& G 5/) = 1. However, if £ t - G 5"; on [0,t] a.s., then 
Too > a.s., i.e, P^T^ >€) = !. Since i > is arbitrary, this leads to P^ r (T 00 = oo) = 1 and 
PJ(£t G 5) = 1, V t > 0. The statement regarding E£[w(£t)] follows from this, Lemma[2]and that 
V t > 0, 



El [«,&)] = El 



m=0 



= lim El [w(£ t )I{t < T m+1 }] 



(b) By definition, we have Vq(x, tt) = E£ [J °° e Qt J A cn(£t-, a)w(da\u), t)dt\ . Then, using Con- 
dition EJb,c) and Theorem [TJa), we obtain 



V (x,n) > -El 



e~ at (Mw(Zt) + c)dt 



> 



e- at (MEHw(tt)]+c)dt 
M(aw(x) +b) c 



e- at (M(e pt w(x) + -(e pt - 1)) + c)dt - 

p a{a - p) 



a 



With Condition^ a) in mind, the statement for Vq(7t) = J s Vq(x, ir)j(dx) follows. 



□ 



7 If Condition^ is satisfied by p and q, then it is also satisfied by p and q, where we recall p = 1 + p. 
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Proof of Theorem [2} (a) Similarly to fi and v (denned by and (UJ), let us define the following 
two random measures : 

/*(w, dt, T)=J2 I{Tm < oo}/{a; m _i G r}/{T m G dt}, Vfe B(S) 



and 



u(oj,dt,T)= f n(da\u,t)q(S\{{t-}\Zt-,a)I{Zt-er}dt,VreB(S). 

J A 

It is shown in the proof of |18| Lem.4] that v is the dual predictable projection of jl, i.e., for any 
nonnegative V x i3(S , jf)-measurable function Y(uj,t, x), 



e: 



fl{dt,dy)Y(t,y) 



D{dt,dy)Y(t,y) 



see Q3I1 Chap. 4, Sec. 5] for more details. Now it immediately follows that El \fi{{0,t],T)] < oo, 
because by using Condition HJc) and the definition of T given in the statement of this theorem, we 
have 



E%[H(0,t],T)] 



El 



I I %(da\w,u)q£ u _(a)I{£ u - eT}di 

JO J A 



< t sup q y < oo. 

yeSt 



(25) 



On the other hand, by Theorem [TJ /i((0, t],T) and jj,((0,t],T) are a.s. finite. Then it follows 
from their definitions that \n((0,t],T) - A((0, t], T)\ < 1 a.s.. Therefore, \p((0,t],T)] < oo. 
Consequently, it is legal to take expectations in the both sides of the following obviously valid 
equation 

7{6Gr} = i{^Gr} + M ((o,t],r)-A((o,t] ! r) a.s., 

from which the statement follows. 

(b) The reasoning for proving part (a) of this theorem can be repeated, except that now one 
needs replace the argument for (f2"5"j) by the following: 



£E[P((Q,i],r)] - El 
< El 



/ / Tt(da\w, u)q^ u _ (a) I{£ u - G T}di 
Jo J A 

f L W (Ca-)I{^u-eT}du 

Jo 

< L f El K£„)] du < oo, 
Jo 



where the second inequality follows from Condition [H and the last inequality is due to Theorem 

HI □ 



Proof of Theorem [3} Step 1. We prove that equation (|TU)) holds for r(x) = u(x)I{x G Si}, 
where Si is defined in Condition [TJ 
We obviously have 



= El 
= El 



J A 
t 



ir(da\uj,v)q(dy \ {£ v }\£ v ,a)dv 

o J A 

n(da\uj,v) / w'(y)q(dy\{£ v }\Z v ,a)dv 



s 



(26) 



J A 



n(da\uj,v) / w'(y) {q{dy\£ v , a) - q({£ v }\£ v , a)I{£ v G dy}} dv 



< oo. 



3 Here, we clarify that V X B(S) denotes the product c-algebra, rather than the Cartesian product. 
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Indeed, by Condition EIa,b) and Theorem [If a), 

E x Tr(da\uj,v) / w'{y)q(dy\£ v ,a)dv 

JO J A Js 

L'pf f El [w(£ v )}dv + b't < oo, 
Jo 







<K 


[/'/ 


JO J A 



iv(da\<jj, v)(p'w (£„) + b')dv 



< 



and 



El 



J A 



n(da\w,v)w'(£ v )\q({(; v }\£ v> a)\dv 

< L' [ El [w(£ v )]dv < oo. 
Jo 

It follows from the previous calculations that 







= K 


Iff 

JO J A 



ir(da\u}, v)w (£«)<?£„ (a)dv 



(27) 



r(y)El 



J A 



< \\r\\ w , / w'(y)E 



TT(da\u, v)q(dy \ {£ v }\£ v ,a)dv 



■n(da\uj 1 v)q{dy \ {£, v }\£,v , a)dv 
o J A 



< oo, 



and 



El 



J A 



ir(da\uj, v)q^ (a)r(^ v )dv 



< oo. 



Now in order to establish equation ([TUf for r(x) = u(x)I{x € Si}, one only needs integrate r(x) 
over S with respect to P£(£t G •) and use Theorem [2 

Step 2. We prove that equation (fTU|) holds for any u(a;) G B^'(S'). By putting 5_i = and 
observing K£t)|/{6 G Si+i \ 5";}] < oo, we have 



[«(&)] - - 



;=-i 



^ u(&)I{& G \ S,} 

i=-l 

OO 

= E * g \ si}] - < x )^ x e s ^ \ 

;=-l l=-l 

OO 



- 2J u^/fa G Si+i \ Si} 



/=-l 

oo 



z=-l 



JS J A 



w(da\uj, v)q{dy%,a)u{y)I{y G Si+i \ Si} 
7r(da|a;, v)q(dy\£ v , a)u(y)dv 



o Js J A 



where the second last equality follows from formally applying the result obtained in Step 1 of this 
proof, i.e., (fTU)) holds for r(x). The involved interchange of the order of integrations, summations 
and expectations is legal, as can be easily verified similarly to (|2l)|) and ([771) . 

Step 3. We prove that equation holds for any u(x) G fi w i(S). In this proof, we repeatedly 
apply (fTO]) to El\u{^t)]. On the one hand, we have 



LHS of unj 



«(*) + El 
i 



n(da\uj, v)q(dy\^ v ,a)u(y)dv 



e~ at El 



JS J A 

n(da\u), v)q(dy\£ v ,a)u(y)dv 



u(x) 



o Js J A 



+ u(x)(e- at -1). 
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On the other hand, we have the following two observations. Firstly, 



El 



e- av (-au(£ v ))dv 



a / e~ av i u(x) + El 



a / e- av El[u(£ v )]dv 



ir(da\ui, r)q(dy\£ r , a)u(y)dr 



JS J A 



dv 



(e- at - l)u(x) - a / e~™EZ 



ir(da\uj, r)q(dy\£ r ,a)u(y)dr 



JSJA 



dv 



(e- at - l)u(x) - aEl 



n(da\uj, r)q(dy\£ r , a)u(y)dr > dv 



o JS J A 



where the interchange of the order of integrals in the first and the last equalities is legal, because 

''' ' 1 < oo and 



evidently, V ue B W ,{S), El J Q e - av a\u(£ v )\dv 

n(da\uj,r)q(dy\^r,a)\u\(y)dr 



e~ av El 



o 



JSJA 



dv < 



Secondly, integration by parts results in 



El 
= El 



e — l I n(da\uj,v)q(dy\^ v ,a)u(y)dv 

JSJA 

e~ at / / / Tr(da\u,r)q(dy\£ r ,a)u(y)dr 

JO JSJA 
t 



+aEl 



Tr(da\uj,r)q(dy\^ r ,a)u(y)dr dv 



o JSJA 



These two observations, together with the expression for LHS of (fTTj) obtained in the above, finally 
lead to 



RHS of CP 



El 



e~ av (-au^ v ))dv 



El 



(e~ at -l)u{x)+El 



JSJA 

ir(da\uj, r)q(dy\£ r ,a)u(y)dr 



ir(da\ui, v)q(dy\£, v , a)u(y)dv 

LHS of CEH), 



JSJA 



as required. 



□ 



Lemma 4 Suppose Condition^b) and Condition^ are satisfied. Then Vh6 B w (S), function v 
given by 

v(x) t inf ( °^ a) + 1 + [ u{y) (^A +I{xedy} 

aeA(x) \ a + 1 + q x a + l + q x Js yyi \ l + q x 

is measurable in x 6 S . 

Proof: By Remark |4j Condition [ljb) and Condition [6j we refer to [121 Lem.8.3.7(a)] for that 
Vu£ B U) (S'), x e S, function^ J g u(y) ^iiMp^i _|- }{ x £ dy}j is continuous in a € A(x). It follows 
from this and Condition |Bfc) that VieS,ue B„,(5), function 



Co (x, a) 



1 + q x 



{y) f^M +I{xedy} 

a + 1 + q x a + l + q x J s \ 1 + q x 



5 It can be easily verified that V (x, a) £ K, ^sM^j^l + J{ x g dy}j is a probability measure on (S, B(S)). 
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is lower semicontinuous in a E A(x). By [TJ Prop. 7. 29], V u E fi w {S), function 

c (x,a) l + q x f t (q(dy\x,a) ^ 

' J s U ( y > [ i + q + G dy > 



a+l + q x a + l + q 



is measurable! 10 ! on K. Now it remains to apply jTTJ D.5 Prop.] (see also [U Prop. 7. 33]) for the 
statement of this lemma. □ 

Proof of Theorem |4| Throughout this proof, x E S is arbitrarily fixed. Due to Lemma [H 
functions u^ n \n = 0, 1, 2, . . . are measurable. Now the proof goes in steps. 

Step 1. We prove that {u^ n \n = 0, 1, . . . } is a non-increasing sequence. 

Straightforward calculations result in 

u W {x)= inf ( C °^ a ) + 1 + [u^(y)( qidylx > a) +I{xedy}) 
aeA(x) {a + l + q x a + I + q x J s \ 1 + q x J 

= inf ( Co ^ a ) + 1 + ^ / ( M (™M+ b K-)( q{Mx ' a) +I{xedy} 
aeA( x ) I a + 1 + q x a + l + q x J s \ a(a - p) a J \ 1 + q x 



< inf 



cq(x, a) 



aeA( x ) [ a + 1 + q x 

l + q x f J (Mja^y) + b) + c\ (g(dy\x, a) + ^ g 



a + 1 + 9x aeA(x) Us V a(a-p) a/ V 1 + 4 
< Mw{x)+c | 1 + f bM | A/(pw(a:)+6) | Mwjx) | c j _ 
~~ a + 1 + q x a + 1 + q x \ a(a - p) (a — p)(l + q x ) a — p a) 

where the last inequality follows from Condition []Jb) and Condition [He). Now the result of Step 
1 follows from this and the monotonicity of the RHS of (fTB"]) with respect to ii". 
Step 2. We prove that V n = 0, 1, . . . , \u^(x)\ < ^^ffi^ + £ = u (0) (aO- 

On the one hand, the result of Step 1 implies that V n = 0,1,..., u^ n \x) < ~^~y-^ + £■ 
On the other hand, we have that 

u W {x)= inf ( C "^ Q ) + 1 + q ~* fu^(y)( qidy ^ a) +I{xed V }) 
aeA( x ) {a + l + q x a + l + q x J s \ 1 + q x J 

= inf ( Co(a: ' a) + 1 + / f^^+^+fU^'^+J^edy} 

aeA(x) {a + l + q x a + l + q x J s \ a(a - p) a J \ 1 + q x 



> inf 



cq(x,o) 



> 



o£A(i) [ a + 1 + (fo 

- inf ( l + fe /f M ^+ 6 ) + £U^ a ) + / { , £ ^ } 
aeA( K ) [ a + 1 + q x J s \ a(a - p) a J \ l + q x 

Mw(x) + c 

a + l + q x 

l + q x . nf f r (M(aMy) +b ) + c_\rq(dy\x : a) +i{xedy} 



a + 1 + q x aeA(x) U s \ a(a — p) a J \ 1 + q. 
Mw(x) + c 
a + l + q x 

l + q x f f ( M(aw(y) +b) + c)( gjfe^) + I[x £ dy} 



a + 1 + q x azA(x) {Js \ a(a-p) aj\ 1 + 4 

> Mw(x)+c l + q x | bM | M(pw{x) + b) | Mic(i) | c l - u (0) (o;) 
a + l + fe a + 1 + g x \ a(a - p) (a - p)(l + g x ) a-p a J 

where the second inequality is because of Condition [2jc) , M ^™^^ b - ) + ^ > and the fact of 
q(dy\xM) _|_ j g ^| ^g^g a probability measure, and the last inequality follows from Condition 



10 We emphasize that by Remark [4] we have that q x is measurable on S. 
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□lb). This and an inductive argument lead to that V n = 0, 1, . . . , u^(x) > - ( M( " ( ^+ b) + f J . 
Thus, Step 2 is completed. 

Now it follows from the results of Step 1 and Step 2 that u*(x) = linin^oo u^(x) exists and 
u*(x) G B ffi (S'). The fact that u* solves the Bellman equation (lT2"j) can be verified in exactly the 
same way as in the proof of [5J Lem.3.3(b)], and its proof is thus omitted. □ 



Lemma 5 Suppose Condition]]^ Condition \^a,b), Condition® and Condition® are satisfied. 
Then under any policy ir, 



V (n) = El 



oo 

at 



n(da\u,t) |co(^ t ,a) - cm(ft) + J q{dy\^ u a)u(y) 
l(dy)u(y), (28) 

s 

where u G B w i (S) is an arbitrary function. 

Proof: By applying Dynkin's formula (TTT1) to e~ at EJ!: [u(£t)] , we have 



e- at EZ[u^ t )] = / 1 (dy)u(y) + E, 



e av n(da\uj,v) < -au(£ v ) + / q(dy\^ v , a)u(y) \ dv 



o 

The expectations of all particular summands are finite here. According to Theorem [Tfb) (see also 

its proof), we can formally add J* e~ av J A ir(da\ui, v)co(£, v , a)dv to the both sides of the above 

equation, and take the limit as t — > oo. We emphasize that lim t _ j . 00 e~ at E™ [u(£i)] = because of 
Theorem [Ha) and Condition [^b) . □ 

The next lemma can be established in exactly the same way as in the proof of [5J Lem.5.3]. 

Lemma 6 Suppose Condition [7J Condition ®(a,b) 7 Condition® and Condition® are satisfied. 
Then under any fixed Markov policy 7r, V x G S, the following assertions hold: 

(a) If u G B w > (S), and au(x) > j A ir(da\x, t)ca(x, a) + f„ J A ir(da\x, t)q(dy\x, a)u(y),V x G S, t > 0, 
then u(x) > Vq(x, it). 

(b) Ifu G B W >(S), and au(x) < J A ir(da\x, t)co(x, a) + J s J A ir(da\x, t)q(dy\x, a)u(y),W x G S, t > 0, 
then u{x) < Vq(x, it). 

Proof of Theorem [5} (a) Using [11] D.5 Prop.] and the fact that u* solves the Bellman equation 
([12"]) . we have that V e > 0, 3 a deterministic stationary policy </> : 

Co(x, 4>(xj) — au*(x) + / q(dy\x, (f>(x))u*(y) < ae, V x G S. 
Js 

It follows from this and Lemma [5] that Vo(4>) < J s ^/(dy)u*(y) + e, and thuJ^l inf^Vb^) < 
J s j(dy)u*(y). On the other hand, by Lemma [SJ we have that under any policy 7r, Vq(tt) > 
J s j(dy)u*(y). Now it is evident that J s 'f(dy)u*(y) = inf T Vb(7r) = inf^ Vq (</>). The proof for 
the existence of a deterministic stationary optimal policy is identical (with few very minor modi- 
fications) to the one of [8j Thm.3.3(c)], and thus omitted. The last statement is obvious. 

(b) Let us arbitrarily fix some x G 5 1 , and put = S x (-). It is obvious that 7 satisfies Condition 
[U(a). Suppose now there is another solution v* G B^S*) to the Bellman equation (|12[) . But then 
it follows from part (a) of this theorem that inf„. Vo(ir) = u*(x) = v*(x). 

(c) We observe that the Bellman function u* is feasible for linear program (fill) . Consider 
any function v that is also feasible for linear program (1141) . Therefore, by referring to Lemma 
EKb), we have that under any Markov policy ir, v(x) < Vo(a;,7r). Now suppose f s ~f(dy)v(y) > 
J s l(dy)u*{y). Then there exist some x G S and constant S > such that u*(x) < v(x) — S. Hence, 
u*(x) < Vq(x, 7r) — S, where it is any Markov policy. But this contradicts part (a) of this theorem. 



Here, we recall that e > is arbitrary. 
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Therefore, any feasible solution v to linear program (|14[) satisfies J s ^/(dy)v(y) < J s r y(dy)u*(y), as 
required. 

(d) From part (c) of this theorem, we know that the optimal value of linear program (|14j) 
is given by J s u*(y) n f(dy). Therefore, if some feasible solution v to linear program fj 14[) satisfies 
u*(x) = v(x) a.s. with respect to 7, then it solves the linear program, too. Hence we conclude the 
sufficiency part of the statement. 

As for the necessity, let v be any optimal solution to linear program (TT4"]) . Suppose the relation 
of v = u* a.s. with respect to 7 is false. Then there exist measurable subsets Ti,^ C S, such 
that the following conditions are satisfied: TiPl^ = 0, v(x) > u*(x) on Ti, v(x) < u*(x) on r 2 , 
v(x) — u*(x) on S \ T% \ r 2 , and the case 7(Ti) = 7(r2) = is excluded. Now let us define a 
function v by v(x) = I{x G S \ r 2 }v(:r) + I{x G r 2 }u*(x), which is feasible for linear program 
(fT4")l . Indeed, firstly, it is evident that v G B^S*). Secondly, we have that V x G S \ T 2 , 

1 If 

—Co(x,a)-v(x) + — / v{y)q{dy\x,a) 
a a J s 

= -Co(x,a) -v(x) + - [ v(y)q(dy\x,a) + - [ u*(y)q(dy\x,a) 
" a Js\r 2 « Jr 2 

> -Co(x,a) -v(x) + - [ v{y)q(dy\x,a) + - [ v(y)q(dy\x, a) > 0, 
a a Js\r 2 « Jr 2 

and V x G r 2 , 

1 if 

-co(x,a) - v(x) + - / v{y)q(dy\x,a) 
a a J s 

= -c (x,a) - u*(x) + - I v(y)q(dy\x,a) + — [ u*(y)q(dy\x,a) 
a a Js\r 2 " Jr 2 

> —c (x,a)-u*(x) + —[ u*(y)q(dy\x,a) + — [ u*{y)q(dy\x 1 a) > 0. 
a " Js\r 2 a Jr 2 

However, J s v(y)j(dy) = J S \ r2 v(x)j(dx) + J*m r2 u*(x)~f(dx) > J s v(x)j(dx), which is a contradic- 
tion against that v is optimal for linear program (fT4"|) . Now the necessity part follows. □ 



Proof of Theorem [6j (a) We take functions w and w' in the form 

w(x) - 



1, ifx = 0; 
=r, if x G (0,1]; 



w'(x) = 



1, iix = 0; 

if a: G (0,1], 



and put 5o = {0}, Si — So U (j^f, 1 , i = 1, 2, . . . . Now Condition QJa,c) is obviously satisfied. 



A 



Condition QJb) can be verified for p = 4A and 6 = as follows: 
if a; = then 



a)ui(y) = 5A / \y 4 dy - A = 4A = pu>(0); 

u 



if x G (0, 1] then 



q(dy\x,a)w(y) = —w(0) w(x) 



For Condition [21 it is sufficient to notice that Vx£ (0,1], 



< < pw(x). 



inf Co(x, a) = 

a£A(x) 



if ^ < A; 



Cix + C 2 4r - 4-, otherwise, 
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inf aeyl (o) c o(0, a) = 0, and a > 4A = p. 

Condition [3] and Condition 0] are trivially satisfied because 

q x {a) = 

V x G (0, 1], A(ar) = [0, §], and A(0) = {0}. 

Condition (5jb,c,d) can be verified similarly to what is presented above by taking p' = ^p, 

6' = 0. Since Vie (0,1], g a < ^ an( i 9o = A, Condition |Sfa) is also satisfied. 
Finally, Condition [6] obviously holds. 

(b) If we denote z( n+1 ) = f(z^ n ') then, for z > | > 0, where e > is any fixed constant, 
function / is differentiable: 

df -5A f 1 du(y,z) 4 

' 2/ dy, 



dz a + A J d 
where 

du(x, z) ^ aC 2 x 2 



aC 2 x 2 - y/a?C%x* + C x C 2 x* + aC 2 x 2 z 

- e (-1,0), V x g (0,1], 



^^C'ix 4 + dC 2 x 3 + aC 2 x 2 z 
M. <r -A 

dz 

It remains to estimate z^ 



so that V z G (|,oo),0 < f- < < 1 



C\x\ C\x 



u [1 \x) = -2aC 2 x 2 + 2^a 2 C 2 x* + CiC 2 x* < -2aC 2 x 2 + \ 2aC 2 x 2 + ) = V s G (0, 1] 

5ACl /%dy>l_gi> 



a(a + A) J 2a 
because a > 4A and Ci < 2a. The map z — > f(z) is contracting on [e, oo), e.g., for e = z^\ Since 



■ ,'10 ~ , a + X\ , 5A 
/ — C 2 A+ < 1 ' 



7 a 7 a + A 



2aC 2 x 2 + -C 2 X + ) x A dx 

7 a 



10 a + A 

— -zr<^2A H , 

7 a 



we conclude that z* < ^C 2 A + 2±*. 

(c) Clearly, function u*(x) (supplemented by u*(0) = 1 — z*) is bounded; hence u* G B w /(iS). 
Therefore, according to Theorem [5] it is sufficient to check that u* solves equation (TT2"j) and <j>* 
provides the infimum. 

Expression in the parenthesis of (|T2l) equals 



A / u*(y)5y' l dy - \u*(0) if x = 0, 
•7o 

and 

Cia; + C 2 a 2 -- + -u*(0) - -u*(x) if x G (0, 11. 

a; a; a; 

Therefore, 

5A Z" 1 

= — rr / u *(y)y 4d v 

and </>*(x) given by ([T^| provides the infimum. (Note that u*(x)+z* > -2aC 2 x 2 +2'y/a 2 C'f x 4 = 0.) 
Finally, at x > 0, the RHS of (TT21) equals Cix — 4^2^ ^ , and equation 

4aC 2 xV(x) = 4CiC 2 x 3 - (u*{x)) 2 - 2u*{x)z* - {z*) 2 

holds because 



u*{x) = -2aC 2 x 2 - z* + 2 v /a 2 C 2 2 x 4 + CiC 2 x 3 + aC 2 x 2 z*. 

□ 
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